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Abstract 

These lectures given to graduate students in high energy physics, provide an introduction to 
Monte Carlo methods. After an overview of classical numerical quadrature rules, Monte Carlo 
integration together with variance-reducing techniques is introduced. A short description 
on the generation of pseudo-random numbers and quasi-random numbers is given. Finally, 
methods to generate samples according to a specified distribution are discussed. Among 
others, we outline the Metropolis algorithm and give an overview of existing algorithms for 
the generation of the phase space of final state particles in high energy collisions. 
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1 Introduction 



Monte Carlo methods find application in a wide field of areas, including many subfields of physics, 
like statistical physics or high energy physics, and ranging to areas like biology or analysis of 
financial markets. Very often the basic problem is to estimate a multi-dimensional integral 

I = jdx f(x) (1) 

for which an analytic answer is not known. To be more precise one looks for an algorithm which 
gives a numerical estimate of the integral together with an estimate of the error. Furthermore 
the algorithm should yield the result in a reasonable amount of time, e.g. at low computational 
cost. There is no point in developing an algorithm which gives the correct result but takes forever. 
However, there is no "perfect" algorithm suitable for all problems. The most efficient algorithm 
depends on the specific problem. The more one knows about a specific problem, the higher the 
chances to find an algorithm which solves the problem efficiently. We discuss in these lectures 
therefore a variety of methods, together with their advantages and disadvantages and hope that 
the reader will gain experience where a specific method can be applied. We will foccus on the 
evaluation of multi-dimensional integrals as a guideline through these lectures. Other applications 
of the Monte Carlo method, for example to optimization problems are not covered in these lectures. 
In the first section we discuss classical numerical quadrature rules. Monte Carlo integration, 
together with various variance-reduction techniques is introduced in the second section. The third 
section is devoted to the generation of uniform pseudo-random numbers and to the generation of 
quasi-random numbers in a d-dimensional hypercube. The fourth section deals with the generation 
of samples according to a specified probability distribution. This section is more focused on 
applications than the previous ones. In high energy physics Monte Carlo methods are mainly 
applied in lattice calculations, event generators and perturbative N"LO-programs. We therefore 
discuss the Metropolis algorithm relevant for lattice calculations, as well as various methods how to 
generate four- vectors of final state particles in high energy collisions according to the phase-space 
measure, relevant to event generators and N"LO-programs. 



2 Classical numerical integration 

Numerical quadrature rules have been known for centuries. They fall broadly in two categories: 
Formulae which evaluate the integrand at equally spaced abscissas (Newton-Cotes type formulae) 
and formulae which evaluate the integrand at carefully selected, but non-equally spaced abscissa 
(Gaussian quadrature rules). The latter usually give better results for a specific class of integrands. 
We also discuss the Romberg integration technique as an example of an extrapolation method. In 
the end of this section we look at the deficiances which occur when numerical quadrature rules 
are applied to multi-dimensional integrals. 

2.1 Newton-Cotes type formulae 

Formulae which approximate an integral over a finite interval by weighted values of the integrand 
at equally spaced abscissas are called formulae of Newton-Cotes type. The simplest example is 
the trapezoidal rule: 

xq+Ax 

J dxf(x) = ^ [/(a*) + /(*„ + Ax)} -V^f- (2) 

where x < £ < x a + Ax. To approximate an integral over a finite interval [x , x n ] with the help of 
this formula one divides the intervall into n sub-intervals of length Ax and applies the trapezoidal 
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rule to each sub-intervall. With the notation Xj — Xq +j • Ax one arrives at the compound formula 
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dxf{x) = X _^^f^)-Y2 J f" (3) 

3=0 

with wo = w n — 1/2 and Wj = 1 for 1 < j < n — 1. Further 

n 

where £j is somewhere in the intervall [acfj_i,Xj]. Since the position of the £j cannot be known 
without knowing the integral exactly, the last term in eq. || is usually neglected and introduces 
an error in the numerical evaluation. Note that the error is proportional to 1/n 2 and that we 
have to evaluate the function f(x) roughly n-times (to be exact (n + l)-times, but for large n the 
difference does not matter). 

An improvement is given by Simpson's rule, which evaluates the function at three points: 

Jdxf(x) = ^{f( Xo )+4f( Xl ) + f( X2 )]- { -^tfW(Z). (5) 

Xq 

This yields the compound formula 

71 1 ( 

= ^^E^fi^-wo^r 1 ^' (6) 

where n is an even number, wq = w n = 1/3, and for 1 < j < n we have Wj = 4/3 if j is odd and 
uij = 2/3 if j is even. The error estimate scales now as 1/n 4 . 

Newton's 3/8-rule, which is based on the evaluation of the integrand at four points, does not 
lead to an improvement in the error estimate: 

X? 5 

dxf(x) = ^[f( XQ ) + 3f( Xl ) + 3f(x 2 ) + f(x 3 )}--^--f^(0- (7) 

As Simpson's rule it leads to a scaling of the error proportional to 1/n 4 . An improvement is only 
obtained by going to a formula based on five points (which is called Boole's rule): 

x 4 

2Ar R( AtV 

dxf(x) = —{7f( Xo )+32f( Xl ) + 12f( X2 ) + 32f( X3 ) + 7f( X4 )}-^Lf( e \0- 

(8) 

Here the error of a compound rule scales like 1/n 6 . In general one finds for an integration rule of 
the Newton-Cotes type that, if the number of points in the starting formula is odd, say 2k — 1, 
then the error term is of the form 

c 2k ^(Ax) 2k+1 f^(0, (9) 

where c 2 k-i is some constant. If the number of points is even, say 2k, the error is of the same 
form: 

c 2fc (Az) 2fc+1 / (2fe) (£)- (10) 
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A formula whose remainder term is proportional to is called of degree n. Such a formula 

is exact for polynomials up to degree n. Now immediately one question arrises: What hinders us 
to improve the error estimate for a compound rule based on Newton-Cotes formulae by using even 
higher point formulae? First, the more refined a rule is, the more certain we must be that it is 
applied to a function which is sufficiently smooth. In the formulae above it is implied that, if the 
error term is proportional to j' < - 2fe '(£) that the function f(x) is at least (2&)-times differentiable 
and that f^ 2k \x) is continous. Applying the rule to a function which does not satisfy the criteria 
can lead to a completely wrong error estimate. (Try to apply the trapezoidal rule on the intervall 
[0, 1] to the function f(x) = for x < 1/4, f(x) = 1 for 1/4 < x < 3/4 and f(x) = for 
x > 3/4.) Secondly, it can be shown if the number of points become large, the coefficients of the 
Newton-Cotes formulae become large and of mixed sign. This may lead to significant numerical 
cancellations between different terms and Newton-Cotes rules of increasing order are therefore not 
used in practice. 



2.2 Gaussian quadratures 

The Newton-Cotes type rules approximate an integral of a function by the sum of its functional 
values at a set of equally spaced points, multiplied by appropriately chosen weights. We saw that 
as we allowed ourselves the freedom in choosing the weights, we could achieve integration formulas 
of higher and higher order. Gaussian integration formulae take this idea further and allows us 
not only the freedom to choose the weights appropriately, but also the location of the abscissas 
at which the function is to be evaluated. A Newton-Cotes type formula, which is based on the 
evaluation at n points is exact for polynomials up to degree n (if n is odd) or degree (n — 1) (if 
n is even). Gaussian quadrature formulae yield integration rules of degree (2n — 1). Furthermore 
these rules can be generalized such that they don't yield exact results for a polynomial up to de- 
gree (2n — 1), but for an integrand of the form "special function" times "polynomial up to degree 
(2n-l)". 

Before stating the main formula for Gaussian quadratures we first give an excursion to Lagrange's 
interpolation formula and introduce orthogonal polynomials. 



2.2.1 Lagrange interpolation formula 

Let xq, x\, x n be (n + 1) pairwise distinct points and let there be given (n + 1) arbitrary 
numbers y , yi, y n . We wish to find a polynomial p n {x) of degree n such that 

Pn{x l ) = y l , i = 0,l,...,n. (11) 

The solution is due to Lagrange and is given by 

n 

Pnix) = 5>/?0r), (12) 

i=0 

where the fundamental Lagrange polynomials are given by 

(x - x )...(x - x i - 1 )(x - x i+1 )...(x - Xn) , 13 s 

* (Xi - X )...(Xi - Xi-i)(Xi - X i+ i)...(Xi - x n ) ' 

Exercise 1: Prove this statement. 

Hint: You may first show that I™ (xj ) equals 1 if i = j and zero if i j. 



If we want to approximate a function f(x) by Lagrange's interpolating polynomial p n {x) such 
that f(xi) — p(xi) for i = 0,1, ...,n the remainder term is given by 

f( X ) = Pn (x)+ J ly ilOE), (14) 
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where II(x) = (x — xq)(x — x\)...(x — x n ) and min(x, Xo) < £ < max(x, x n ). 
Exercise 2: Show that 

is the unique polynomial of degree (2n + 1) for which 

P(2n + l)(xi) = f(Xi), p'(2n+l)(Xi) = f' (Xi) for i = 0, 1, H. (16) 

The remainder term is given by 

i-(2n+2),£N 

f(x) = P(2n+1) ( x ) + L^—M (Il(x)f , (17) 
where min(x,xo) < £ < max(x,x„). 

2.2.2 Orthogonal polynomials 

A sequence of polynomials Pq(x), Pi(x), in which P n (x) is of degree n is called orthogonal with 
respect to the weight function w(x) if 

dx w(x)Pi(x)Pj(x) = for % ^ j. (18) 

Here we should mention that a function w(x) defined on an interval [a, b] is called a weight 

b b 

function if w(x) > for all x € [a, 6], J dx w(x) > and J dx w{x)x 3 < oo for all j = 0, 1, 2, .... 

a a 

By rescaling each P n {x) with an appropriate constant one can produce a set of polynomials which 
are orthonormal. An important theorem states that the zeros of (real) orthogonal polynomials 
are real, simple and located in the interior of [a, b]. A second theorem states that if X\ < x-i < 
... < x n are the zeros of the orthogonal polynomial P n (x), then in each interval [a, xi], [xi, X2], 
[x„_i,x„], [x n ,b] there is precisely one zero of the orthogonal polynomial P„ + i(x). Well-known 
sets of orthogonal polynomials are the Legendre, Tschebyscheff, Gegenbauer, Jacobi, Laguerre 
and Hermite polynomials. Some of them are generalizations or specializations of others. In order 
to distinguish them one characterizes them by the interval on which they are defined and by the 
corresponding weight function. The first four (Legendre, Tschebyscheff, Gegenbauer and Jacobi) 
are defined on the intcrvall [—1, 1], the Laguerre polynomials are defined on [0, 00] and finally the 
Hermite polynomials are defined on [—00,00]. The weight function for the Legendre polynomials 
is simply w(x) = 1, the weight function for the Gegenbauer polynomials is w(x) = (1 — x 2 ) A1_1//2 
where /i > —1/2. The special cases corresponding to /1 = and [i — 1 give the Tschebyscheff 
polynomials of the first and second kind, respectively. And of course, in the case fj, = 1/2 the 
Gegenbauer polynomials reduce to the Legendre polynomials. The Jacobi polynomials have the 
weight function w(x) = (1 — x) a (l + x) 13 with a, (3 > —1. Here the special case a — (3 yields up 
to normalization constants the Gegenbauer polynomials. The generalized Laguerre polynomials 
have the weight function w(x) = x a e~~ x where a > —1. Here a — corresponds to the original 
Laguerre polynomials. The Hermite polynomials correspond to the weight function exp(— x 2 ). We 
have collected some formulae relevant to orthogonal polynomials in appendix 

2.2.3 The main formula of Gaussian quadrature 

If w(x) is a weight function on [a, 6], then there exists weights Wj and abscissas Xj for 1 < j < n 
such that 



J dx w(x)f(x) = J2 w of^) + ^r^yr J dx w ^ P^)] 



1, 



(19) 
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with 



U(x) = (x - xi)(x - x 2 )—(x - x n ), 

a < x\ < x 2 < ... < x n < b, a < £ < 6. (20) 

The abscissas are given by the zeros of the orthogonal polynomial of degree n associated to the 
weight function w(x). In order to find them numerically it is useful to know that they all lie in the 
interval [a, b]. The weights are given by the (weighted) integral over the Lagrange polynomials: 

6 

dxw(x)l](x). (21) 
Exercise 3: Let Po(x), Pi(x), ... be a set of orthonormal polynomials, e.g. 

b 

dxw(x)P i {x)P j (x) = Sij, (22) 

let xi, X2, Xn+i be the zeros of P n +i(x) and Wi, W2, ■ w n +i the corresponding Gaussian weights 
given by eq. l|4 Show that for i,j < n + 1 

n+l 

y] w k Pj(xk)Pj(xk) = S t j, (23) 
fc=i 

e.g. the Po, Pi, P n are orthonormal on the zeros of P n +i. This equation can be useful to check the 
accuracy with which the zeros and weights of P„+i have been determined numerically. 

2.3 Romberg integration 

Suppose we have an integration rule of degree (r — 1) which evaluates the integrand at n points, 
e.g. 

b 

dx f(x) = S [aM [f]+R [aM [f], S [aM [f] =Y i W j f{x j ) (24) 

i=i 

and R[ a ,b] [f] denotes the remainder. We may then construct a new rule of degree r as follows: 

b 

dx f(x) = pS [aM [/] + q (S[ a ,( 0+ 6)/2] [/] + %o+6)/2,6]) + R[a,b] [/] (25) 

with p = —1/(2'' — 1) and q = 2 r / (2 r — 1). Since the original rule eq. ^ is of degree (r — 1), the 
remainder term -Ru is of the form c'(b — a) r+1 /' r - ) (£), but for f = x r the r-th derivative is a 
constant. We find therefore for 

fb— V' +1 

R[a.b]W\ = pc{b~ay +l + 2qc{-^\ =0. (26) 
This proves that the new rule eq. 25 is now of degree r. 

Exercise Take as the original rule the trapezoidal rule and construct the improved rule. What do 
you get? Repeat the exercise with Simpson's rule and Newton's 3/8 rule as starting point. At how many 
different points do you have to evaluate the integrand with the improved rules? Is this efficient? Can the 
fact that p and q have opposite signs cause any problems? 
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In practice Romberg integration is used with the trapezoidal rule. Let 



S' 1 
o 



2 k 



-t 

3=0 



(27) 



be the trapezoidal sum with 2 k + 1 points and define 



s (k) 



A m S, 



(fc+i) 
m— 1 



? (fc) 

m — 1 



(28) 



Then the series S { °\ Sf\ sf } 



(0) Q (l) e (2) 



converges better then the series Sq , Sq , S t 



Romberg 



integration is a special case of an extrapolation method (Richardson extrapolation to be precise). 



Based on a few estimates S, 
limit k — > oo. 



(fc-i) 



S^ fc) with 2" 



1. 



1 points one extrapolates to the 



2.4 Multi-dimensional integration 

In many problems multi-dimensional integrals occur, which have to be evaluated numerically. One 
may try to extend the one-dimensional integration formulae to d-dimensional integration formulae 
by viewing the d-dimensional integral as an iteration of one-dimensional integrals and applying a 
one-dimensional integration rule in each iteration. As an example we consider an integral over the 
rf-dimensional hypercube [0, l] d evaluated with the help of the trapezoidal rule: 

d d uf(u u ...,u d ) = -i X ... X ,r, ..,r, / ±) \-o(±). (29) 

31=0 j d =0 V / \ / 

In total we have to evaluate the function N = (n + l) d w n d times. Since the necessary computing 
time is proportional to TV we observe that the error scales as N~ 2 / d . With increasing dimension d 
the usefulness of the error bound 0(N~ 2 / d ) declines drastically. Changing for example from the 
trapezoidal rule to Simpson's rule does not change the situation significantly: The error bound 
would scale in this case as N~ 4 / d . We will later see that Monte Carlo integration yields an error 
which decreases with l/yiV independent of the number of dimensions. It has therefore become 
the method of choice for numerical integration in high dimensions. 



2.5 Summary on numerical quadrature rules 

Numerical quadrature rules are the best method for one-dimensional integrals. If the integrand is 
sufficiently smooth and if one knows an absolute bound for a certain derivative of the integrand, 
they yield an exact error estimate. The efficiency of numerical quadrature rules decreases rapidly 
with the number of dimensions. Furthermore, for complicated integration boundaries, which have 
to be imbedded for example into a hypercube, the integrand is no longer a smooth function and 
the estimate for the error can no longer be used. 

Further reading: You can find more information on numerical quadrature rules in the books 
by Davis and Rabinowitz Q and by Press et al. Q. 

Exercise 5: Evolution of parton densities using numerical quadrature with Laguerre polynomials. This 
method is due to D.A. Kosower Bl. The evolution equations for the quark non-singlet parton distributions 
f(x, Q 2 ) of the proton read 

Q 2 df »n? 2) = P{x,Q 2 )®f{x,Q 2 ), (30) 
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where x stands for the nucleon's momentum fraction carried by the parton, P(x, Q 2 ) is the Altarelli-Parisi 
evolution kernel, and ® denotes the convolution 

1 1 

A(x)®B(x) = / " dy j dz.5(x-yz)A(y)B(z) (31) 



The evolution kernel is known to next-to-leading order: 

P(x,Q 2 ) = a s (Q 2 )P (x)+a 2 (Q 2 )P 1 {x)+O(a 3 3 ), (32) 

where we have introduced a s (Q 2 ) = a s (Q 2 ) / +k . In the Mellin-transform approach one factorizes eq. ^ by 
taking Mellm moments. The Mellin moments of a function h(x) are given by 

i 

h z = dx x z ~ 1 h(x). (33) 



o 

Truncating to next-to-leading order we obtain: 

Q 2 ^fi = (a s (Q 2 )Po+a 2 s (Q 2 )Pi)f Z (Q 2 )- (34) 
This equation is formally solved by the introduction of an evolution operator E z : 

r(Q 2 ) = E*(Q 2 ,Ql)f{Ql). (35) 
The evolution operator is for the quark non-singlet case of the form: 



E z {Q\Ql) 



a s (Q 2 ) - a s {Qt) t 
1 + l 71 "^ 70 



(36) 



where y§ and j" are the first and second coefficients of the anomalous dimensions. Retrans forming back 
one obtains the evolved parton distribution in x-space by 

f{x,Q 2 ) = ^- J dzx-*f{Q 2 ), (37) 
c 

where the contour C runs to the right of all singularities of the integrand. 

The parton distributions are usually parametrized at the input scale Q% in a form like 

xf(x,Ql) = AiX ai {l-x) Pi (38) 

i 

with Mellm transform 

f(Q 2 o) = ^2 AtB(z + at- 1,1 + fr), (39) 

i 

where B(x,y) is the beta function. One example is the CTEQ J^M structure function for the u-quark 
valence distribution u v , given at the scale Qq — 1.6GeV: 

xu v = 1.344 a; - 501 (l - t) 3 ' 689 (l + 6.402 x°- srs ) . (40) 

Droping from now on the arguments Q 2 and Qq our task is to evaluate the integral 

f(x,Q 2 ) = -Re I dz-F(z), F(z) = x~ z E z V A,B(z + a t - 1, 1 + ft), (41) 

IT I l 

where we have used complex conjugation to replace the contour C by C s , starting at the real axis, right 
to the right-most pole and running upwards to infinity. The most elegant way to solve this problem is to 
choose a contour in such a way that the integrand can very well be approximated by some set of orthogonal 
polynomials. We neglect for the moment the evolution operator E z . We try a parabolic contour 

1 2 

z(t) = z +it + -c 3 t 
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and determine the parameters zq and C3 according to 



(42) 



Finally we change variables 



As the result we obtain 



•Hi)'. 



/(*> Q 2 ) = ff / ^j= e ~ URe [ e " ( X - ^(«(caV5))] (44) 



and the integrand can be approximated by Laguerre polynomials L n l (as). Start from the CTEQ parame- 
terization and find zo by solving F (zo) = numerically (you may use Newton-Raphson for example) and 
determine the parameters C2 and C3. Evaluate the integral by using a Gaussian quadrature formula for 
Laguerre polynomials with 3, 5 or 10 points. Here you have to find the correct abscissas and weights. Since 
we set the evolution operator E z — 1 you should recover the original parton density at the scale Q\. The 
inclusion of the evolution operator does only slighty modify the integrand and one can therefore use the 
same contour as in the non-evolved case. (The relevant anomalous dimensions can be found for example 
in Eh.) 



3 Monte Carlo techniques 

We have seen in the previous section that numerical quadrature rules are inefficient for multi- 
dimensional integrals. In this section we introduce Monte Carlo integration. We will show that 
for Monte Carlo integration the error scales like 1/y/N, independent of the number of dimensions. 
This makes Monte Carlo integration the preferred method for integrals in high dimensions. But, 
after the first euphoria is gone, one realizes that convergence by a rate of 1/s/N is pretty slow. 
We discuss therefore several techniques to improve the efficiency of Monte Carlo integration. 



3.1 Monte Carlo integration 

We consider the integral of a function f(u\, ...,114), depending on d variables u\, u c i over the 
unit hypercube [0, l] d . We assume that / is square-integrable. As a short-hand notation we will 
denote a point in the unit hypercube by x = (ui, ...,itd) & n d the function evaluated at this point 
by f{x) — f(u\, Ud). The Monte Carlo estimate for the integral 

dxf(x)= / d d uf(u 1 ,...,u d ) (45) 



is given by 

E 



1 N 

V £/(*»)• ( 4fi ) 



N 

n=l 



The law of large numbers ensures that the Monte Carlo estimate converges to the true value of 
the integral: 



1 N 

J im £/(*«) = L ( 4? ) 



71=1 
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In order to discuss the error estimate for finite N, we first introduce the variance er 2 (/) of the 
function f(x): 



a*(f) = I dx(f(x)-lf. (48) 



We can then show that 



N x 2 



|^ ff /W_J = (49) 



n=l 



Exercise 6: Prove eq. \4Q. 

Hint: You may want to introduce an auxiliary function g(x) = f(x) — I and show f dxg(x) — first. 

Eq. |^ can be interpreted to mean that the error in the Monte Carlo estimate is on the aver- 
age a{f)/\/~N . <y(f) is called the standard deviation of /. The central limit theorem tells us then 
that the probability that our Monte Carlo estimate lies between I — acr(f) / VN and / + ba(f)/V~N 
is given by 



lim Prob (-a^fi < — V f(x n ) - I < b^fi] = -±= f dt cxp (-- I . ( r,o) 



This also shows that error in a Monte Carlo integration scales like 1/s/N independent of the 
dimension d. Of course, in practice one cannot obtain easily the exact value for the variance u 2 (f) 
and one uses the Monte Carlo estimate 

N N 

s2 - attiE ^EtfW) e2 (51) 

n— 1 n— 1 

instead. 

Exercise 7: For a reliable error estimate we had to require that the function f is square integrable. If 
the function f is integrable, but not square integrable, the Monte Carlo estimate E for the integral will still 
converge to the true value, but the error estimate will become unreliable. Do a Monte Carlo integration of 
the function f(x) — 1/y/x over the interval [0, 1]. 

We would like to draw the attention to the fact that Monte Carlo integration gives only a proba- 
bilistic error bound, e.g. we can only give a probability that the Monte Carlo estimate lies within 
a certain range of the true value. This should be compared to numerical quadrature formulae. If 
we use for example the trapezoidal rule and if we know that the second derivative of / is bounded 
by, say 

|/"0)|<c, x a < x < x + Ax, (52) 
we obtain a deterministic error bound: 



^(f(x ) + f(x +Ax))- I ,/.,-/(.,■: 



< ■ OB) 



Here the error is guaranteed to be smaller than c(Ax) 3 /12. On the other hand to obtain this 
bound we used the additional information on the second derivative. 

Exercise 8: Buffon's needle technique to estimate n. Buff on used in 1777 the following procedure to 
estimate ty : A pattern of parallel lines separated by a distance d is laid out on the floor. Repeatedly a 
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needle of length d is thrown onto this stripped pattern. Each time the needle lands in such a way as to 
cross a boundary between two stripes, it is counted as a hit. The value of ir is then estimated from twice 
the number of tries divided by the number of hits. The above recipe is based on the fact that the probability 
of a hit is 2/n. This can be seen as follows: Let <p be the angle between the needle and the perpendicular to 
the stripes. For a given <p the probability of a hit is j cosy|. Since all angles are equally likely, the avarage 
value of |cosy?| can be calculated by integrating over all angles and dividing by the range. The problem 
with that algorithm is, that it gives large statistical errors. Write a Monte Carlo program which estimates 
7r by using Buffon's needle technique. How often do you have to throw the needle to get the first three, five 
or seven digits of n? 



3.2 Variance reducing techniques 

We have seen that the error estimate of a Monte Carlo integration scales like 1/y/N. The main 
advantage of Monte Carlo integration is the fact, that the error estimate is independent of the 
dimension d of the integral. However we pay the price that the Monte Carlo estimate for the 
integral converges relatively slow to the true value at the rate of l\~N. Several techniques exist 
to improve the situation. 



3.2.1 Stratified sampling 

This technique consists in dividing the full integration space into subspaces, performing a Monte 
Carlo integration in each subspace, and adding up the partial results in the end. Mathematically, 
this is based on the fundamental property of the Riemann integral 

1 a 1 

dxf(x) = J dxf(x) + J dxf(x), < a < 1. (54) 

a 

More generally we split the integration region M = [0, l] d into k regions Mj where j = 1, k. In 
each region we perform a Monte Carlo integration with Nj points. For the integral / we obtain 
the estimate 

E = £v^£ /(ajf|) (55) 

and instead of the variance a 2 (f)/N we have now the expression 

±^ai^ (fU (56) 

3 = 1 3 

with 



ct2 (/)| M = nTTTTT / f( x ) ~ Z7ZT7TTT I ^ x f( x 




dx f(x) . (57) 



If the subspaces and the number of points in each subspace are chosen carefully, this can lead to a 
dramatic reduction in the variance compared with crude Monte Carlo, but it should be noted that 
it can also lead to a larger variance if the choice is not appropriate. If we take a = 1/2 in eq. pj 



13 



and use N a points in the first region [0, a] and iVj, points in the second region [a, 1] we obtain for 
the error estimate 



i M/)L . ff2 (/)l 
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4\ N a N b / 

which is minimized for fixed N = N a + Nb by choosing 

N a _ tr(f)\ a 



N ^ °{f)\ a + °(f)\ b 



(58) 



(59) 



In general the total variance is minimized when the number of points in each subvolume is pro- 
portional to u(f)\ M .. 

3.2.2 Importance sampling 

Mathematically, importance sampling corresponds to a change of integration variables : 

dxf{x)= ! f M p[x)dx= lw) dP{x) m 

with 

= iS^ p{x) - (6i) 

If we restrict p{x) to be a positive-valued function p(x) > and to be normalized to unity 

dxp(x) = 1 (62) 



we may interpreted p(x) as a probability density function. If we have at our disposal a random 
number generator corresponding to the distribution P(x) we may estimate the integral from a 
sample X\, x^ of random numbers distributed according to P(x): 



N 

I 

E 



- N^p{x n y (63) 

The statistical error of the Monte Carlo integration is given by <j(f/p)/y/N, where an estimator 
for the variance a 2 (/ jp) is given by 

L ) - ^Ef^NV-^- ( 64 ) 
Pj N^\p{x n )) 

We see that the relevant quantity is now f(x)/p(x) and it will be advantageous to choose p(x) 
as close in shape to f(x) as possible. If f(x) is positive-valued one might be tempted to choose 
p(x) = cf(x). The constant is easily found to be 1/1 and the variance a 2 (f/p) turns out to be 
zero. So we would have found a perfect method which would return the correct value with only 
one sampling point. Of course life in not so simple and the crucial point is that in order to sample 
f/p we must know p, and in order to know p(x) = f(x)/I we must know /, and if we already 
know / we don't need a Monte Carlo integration to estimate it. So in practice one chooses p(x) 
such that it approximates \f(x)\ reasonably well in shape and such that one can generate random 
numbers distributed according to P(x). 

One disadvantage of importance sampling is the fact, that it is dangerous to choose functions 
p(x), which become zero, or which approach zero quickly. If p goes to zero somewhere where / 
is not zero, a 2 (f jp) may be infinite and the usual technique of estimating the variance from the 
sample points may not detect this fact if the region where p = is small. 
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3.2.3 Control variates 



As in importance sampling one seeks an integrable function g which approximates the function / to 
be integrated, but this time the two functions arc subtracted rather than divided. Mathematically, 
this technique is based on the linearity of the integral : 

dx f(x) = J dx(f{x) - g{x)) + J dx g{x). (65) 

If the integral of g is known, the only uncertainty comes from the integral of (/ — g), which will 
have smaller variance than / if g has been chosen carefully. The method of control variates is more 
stable than importance sampling, since zeros in g cannot induce singularities in (/ — g). Another 
advantage over importance sampling is that the integral of the approximating function g need not 
be inverted analytically. 

3.2.4 Antithetic variates 

Usually Monte Carlo calculations use random points, which are independent of each other. The 
method of antithetic variates deliberately makes use of correlated points, taking advantage of the 
fact that such a correlation may be negative. Mathematically this is based on the fact that 

var(/i + / 2 ) = var(/i)+var(/ 2 ) + 2covar(/i,/ 2 ). (66) 

If we can arrange to choose points such that f± and / 2 are negative correlated, a substantial 
reduction in variance may be realized. The most trivial example for the application of the method 
of antithetic variates would be a Monte Carlo integration of the function f(x) = x in the intervall 
[0, 1] by evaluating the integrand at the points xi and 1 — Xi. 

3.3 Adaptive Monte Carlo methods 

The variance-reducing techniques described above require some advance knowledge of the be- 
haviour of the function to be integrated. In many cases this information is not available and one 
prefers adaptive techniques, e.g. an algorithm which learns about the function as it proceeds. 
In the following we describe the VEGAS-algorithm |8|, which is widely used in high-energy 
physics. VEGAS combines the basic ideas of importance sampling and stratified sampling into an 
iterative algorithm, which automatically concentrates evaluations of the integrand in those regions 
where the integrand is largest in magnitude. VEGAS starts by subdividing the integration space 
into a rectangular grid and performs an integration in each subspace. These results are then used 
to adjust the grid for the next iteration, according to where the integral receives dominant con- 
tributions. In this way VEGAS uses importance sampling and tries to approximate the optimal 
probability density function 

/ \ _ 1/0*01 

PoptimaW - J dxlf{x) \ V>n 

by a step function. Due to storage requirements one has to use a separable probability density 
function in d dimensions: 

p(ui,...,Ud) = pi(ui) -p 2 (u 2 ) • ... -pd{ud)- (68) 

Eventually after a few iterations the optimal grid is found. In order to avoid rapid destabilizing 
changes in the grid, the adjustment of the grid includes usually a damping term. After this initial 
exploratory phase, the grid may be frozen and in a second evaluation phase the integral may be 
evaluated with high precision according to the optimized grid. The separation into an exploratory 
phase and an evaluation phase allows one to use less integrand evaluations in the first phase and 
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to ignore the numerical estimates from this phase (which will in general have a larger variance). 
Each iteration yields an estimate Ej together with an estimate for the variance S"j: 

Here Nj denotes the number of integrand evaluations in iteration j. The results of each iteration 
in the evaluation phase are combined into a cummulative estimate, weighted by the number of 
calls Nj and their variances: 

If the error estimates Sj become unreliable (for example if the function is not square integrable), 
it is more appropriate to weight the partial results by the number Nj of integrand evaluations 
alone. In addition VEGAS returns the x 2 per degree of freedom: 



This allows a check whether the various estimates are consistent. One expects a x 2 /dof not much 
greater of one. 

In low dimensions one may use stratified sampling instead of importance sampling. If in each 
cell at least two points are thrown, the contribution to the variance from each cell may be esti- 
mated and the grid can be adjusted such that it minimizes the total variance. This method is 
however restricted to low dimensions. If b denotes the number of bins per axis, d the number of 
dimensions and one requires two points per cell, one needs at least 

N = 2b d (72) 

integrand evaluations. This number grows exponentially with d and for large d one has to resort to 
importance sampling. Note that for large d there are inevitably cells, into which no point is thrown. 



Exercise 9: Write a computer program which implements the adaptive grid technique of the VEGAS- 
algorithm in one dimension. It should integrate a function f(x) in the intervall [0, 1]. Split the interval 
[0, 1] into b bins [xi-i,Xi], where = xo < x\ < ... < x b = 1. The probability that a point is thrown into 
bin i is 1/6. Inside a bin the points a chosen with a uniform probability. After each iteration the boundaries 
are readjusted. Adjust the bins in such a way that each bin would give a contribution ofl/b J dx\f(x)\, 
based on the estimate you got in the last iteration. 



3.4 Multi-channel Monte Carlo 

If the integrand f(x) has sharp peaks, crude Monte Carlo usually leads to poor results. The sit- 
uation can sometimes be improved by a remapping of variables, such that the integrand becomes 
more flat. However there might be situations where the integrand exhibits different peaks in dif- 
ferent regions. In such cases it is often impossible to find a variable transformation, which remaps 
all peaks simultaneously. Multi-channel Monte Carlo offers a solution, if the transformations for a 
single peak structure are known. Each such transformation is known as a channel. Each channel 
is specified by a probability density function Pi(x) and a mapping P^ 1 (y) from random numbers 
y distributed according to Pi(x) into the region of integration: 

x = Pr\y). (73) 
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Each density is non-negative and normalized to unity: J dxpi(x) = 1 for i = 1, ...,m, where m 
denotes the number of channels. Let on > 0, i — 1, ...,m be non-negative numbers, such that 

n 

E a * = L ( 74 ) 

i=l 

A specific channel is then selected with probability o<j. In practice one fixes the total number of 
integrand evaluations and evaluates each channel roughly iVj ~ QijiV times. The integral we want 
to calculate is 



i = jdx fix) = f>/ S^*)' 



where p(x) = ctiPi(x). The Monte Carlo estimate for the integral is then 

m N 



(75) 



E iVV^4. (76) 



The expected error of the integration is given by 



W(a) - I 2 



N 

where W(a) is given by 



(77) 



W(a) = jr ai I dPtix). (78) 



i=i 



By adjusting the parameters ctj one may try to minimize W(a). Since the integral I does not 
depend on the parameters a* one may change the Ui during the integration. The at do not affect 
the estimate for the integral, but only the estimate for the error. 

The method suggested in || starts from an initial set c^, performs a few hundred Monte Carlo 
evaluations to estimate 

Wiia') = Jdx Pl (x)(^j (79) 
and rescales the parameters according to 



E l «' l (^(«')) /3 ' 

The suggested values for the parameter /3 range from 1/2 to 1/4 [llC 



(80) 



3.5 Summary on Monte Carlo techniques 

Monte Carlo integration offers a tool for numerical evaluation of integrals in high dimensions. Fur- 
thermore Monte Carlo integration works for smooth integrands as well as for integrands with dis- 
continuities. This allows an easy application to problems with complicated integration boundaries. 
However the error estimate scales always with 1 / y/N. To improve the situation we introduced the 
classical variance reducing techniques (stratified sampling, importance sampling, control variates, 
antithetic variates) as well as two advanced methods: The adaptive VEGAS-algorithm, which 
learns about the integrand as it proceeds and multi-channel Monte Carlo, which is useful when 
the integrand is sharply peaked in different regions of the integration region. 

Further reading: The material presented in this section is based on the book by Hammersley 
and Handscomb || and the review article by James 0. 
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4 Random numbers 



Since a computer is a deterministic machine, truly random numbers do not exist on a computer. 
One uses therefore pesudo-random numbers. Pseudo-random numbers are produced in the com- 
puter deterministicly by a simple algorithm, and are therefore not truly random, but any sequence 
of pseudo-random numbers is supposed to appear random to someone who doesn't know the al- 
gorithm. More quantitatively one performs for each proposed pseudo-random number generator a 
series of tests T±, T2, T n . If the outcome of one test differs significantly from what one would 
expect from a truly random sequence, the pseudo-random number generator is classified as "bad" . 
Note that if a pseudo-random number generator has passed n tests, we can not conclude that it 
will also pass test T n+ \. 

In this context also the term "quasi-random numbers" appears. Quasi-random numbers are not 
random at all, but produced by a numerical algorithm and designed to be distributed as uniformly 
as possible, in order to reduce the errors in Monte Carlo integration. 

4.1 Pseudo-random numbers 

By today's standard a good random number generator should satisfy the following criteria: 

• Good distribution. The points should be distributed according to what one would expect 
from a truly random distribution. Furthermore a pseudo-random number generator should 
not introduce artificial correlations between succesivley generated points. 

• Long period. Both pseudo-random and quasi-random generators always have a period, after 
which they begin to generate the same sequence of numbers over again. To avoid undesired 
correlations one should in any practical calculation not come anywhere near exhausting the 
period. 

• Repeatability. For testing and development, it may be necessary to repeat a calculation with 
exactly the same random numbers as in the previous run. Furthermore the generator should 
allow the possibility to repeat a part of a job without doing the whole thing. This requires 
to be able to store the state of a generator. 

• Long disjoint subsequences. For large problems it is extremely convenient to be able to 
perform independent subsimulations whose results can later be combined assuming statistical 
indcpcdcncc. 

• Portability. This means not only that the code should be portable (i.e. in a high-level 
language like Fortran or C), but that it should generate exactly the same sequence of numbers 
on different machines. 

• Efficiency. The generation of the pseudo-random numbers should not be too time-consuming. 
Almost all generators can be implemented in a reasonably efficient way. 

To test the quality of a pseudo-random number generator one performs a series of test. The 
simplest of all is the frequency test. If the algorithm claims to generate pseudo-random numbers 
uniformly distributed in [0, 1], one divides this intervall into b bins, generates n random number 
Mi, U2, ... u n and counts the number a pseudo-random number falls into bin j (1 < j < b). One 
then calculates x 2 assuming that the numbers are truly random and obtains a probability that 
the specific generated distribution is compatible with a random distribution. The serial test is a 
generalization of the frequency test. Here one looks at pairs of succesive generated numbers and 
checks if they are uniformly distributed in the area [0, 1] x [0, 1]. 

Another popular test is the gap test. Let a and (3 be two real numbers with < a < (3 < 1. Given 
a sequence u\, 112, ... of supposedly random numbers one determines the length of consecutive 
subsequences Uj, Uj+i, Uj+ r in which Uj +r lies between a and (3, but the other u's don't. Such 
a subsequence is called a gap of length r. Having determined the gaps of length 0, 1, ... ,t, one 
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then applies a x 2 -test to this empirical distribution. 

We give an overview of some of the most popular algorithms for pseudo-random number gen- 
erators. All algorithm are given such that they generate integer numbers up to to. To obtain real 
numbers in the interval [0, 1] one divides therefore by to. Some algorithm can be implemented in 
such a way that they work directly with floating point numbers. 

4.1.1 Multiplicative linear congruential generator 

Each succcsivc integer is obtained by multiplying the previous one by a well chosen multiplier, 
optionally adding another constant, and throwing away the most significant digits of the result: 

Si = (asi-i + c) mod to. (81) 

where a and m are relativly prime. The linear congruential generator has a period no longer than 
to. Its theory is quite well understood [ fj"l| . For the modulus to one chooses usually a number of 
the form 2 r , 2 r + 1 or 2 r — 1, which allow a fast implementation of the routine. The choice to = 2 r 
is not always the best one. In this case it can be shown that the I lowest order bits of have 
a period not exceeding 2 l . For example, the last bit is either constant or stricly alternating, the 
last two bits have a period no longer than two, the last three bits a period no longer than 8. In 
most applications the lowest order bits are however insignificant. As already stated the maximal 
period is to. There is a theorem which states that the generator has the maximal period if and 
only if c > is relatively prime to m, a — 1 is a multiple of p for every prime p dividing m and 
a — 1 is a multiple of 4, if 4 divides to. In the case c = the maximal possible period is attained 
if so is relatively prime to to and if a is a primitive element modulo to. (If a is relatively prime to 
to, the smallest integer A for which a A = 1 mod to is called the order of a modulo to. Any such a 
which has the largest possible order modulo to is called a primitive element modulo to.) In this 
case the period is A(2 r ) = 2 r ~ 2 if m = 2 r with r > 3, A(p r ) = p r ~ 1 (p — 1) if to = p r with p > 2 a 
prime number and the least common multiple of A(p^), ... A(p[') if to = p^ 1 ■ ... ■ p r t l with pi, 
Pt being prime numbers. This leaves the question in which cases a is a primitive element modulo 
to. If to = 2 r with r > 4 the answer is if and only if a mod 8 equals 3 or 5. In the case m = p 
where p is a prime number greater than 2, a has to satisfy a ^ mod p and a( p ~ 1 '' q =/= 1 mod p 
for any prime divisor q of p — 1 . 

One choice for the constants would be a = 69069, c = and to = 2 32 . Other choices are 
a = 1812433253, c = and to = 2 32 or a = 1566083941, c = and to = 2 32 . As a historical 
note, the first proposal was a — 23, c = and to = 10 8 + 1. IBM used in the sixties and seventies 
the values a — 65539, c = and to = 2 31 as well as a — 16807, c = and to = 2 31 — 1. These 
generators are not recommended by today's standards. 

Exercise 10: Consider the simple multiplicative linear congruential generator with a = 5, c = 1, m = 16 
and so = 1. Write down the first twenty numbers generated with this method. How long is the period ? 
Since m = 2 4 write down the sequence also in the binary representation and look at the lowest order bits. 

One of the drawbacks of the multiplicative linear congruential generator was discovered by the spec- 
tral test. Here one considers the set of points (s n , s n +i, s n +d-i) of consecutive pseudo-random 
numbers in d-dimensional space. It was discovered that these points lie mainly in hyperplanes. 

Exercise 11: Generate a three- and a two-dimensional plot of points (s n , Sn+i, Sn+2) and (s n ,s n +i) taken 
from the generator a = 137, c = 187 and m = 256. 

Since we work with a finite precision, even sequences from truly random numbers, truncated 
to our precision, would reveal a lattice structure. Let 1/fi denote the distance between neigh- 
bouring points in a one-dimensional plot, l/i>2 the distance between neighbouring lines in a two- 
dimensional plot, 1/^3 the distance between neighbouring planes in a three-dimensional plot, etc. 
The difference between truly random sequences, truncated to a certain precision, and sequences 
from a multiplicative linear congruential generator is given by the fact, that in the former case 
l/vd is indepenent of the dimension d, while in the latter case 1/vd increases with the dimension d. 
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Exercise 12: We are interested in the integral 



111 

I = 2 J dx J dy J dz sin 2 (27r(9a; - Qy + z)) . (82) 


This integral can be solved analytically by doing the z -integration first and yields 1. Suppose we are 
ignorant about this possibility and perform a brute-force Monte Carlo integration using the multiplicative 
linear congruential generator a = 65539, c = and m — 2 31 . Innocent as we are we use three consecutive 
random numbers from the generator to define a point in the cube: (x, y, z) n = (ss n /m, S3„+i/m, szn+2/m). 
What do you get ? In order to see what went wrong show that three consecutive numbers of this generator 
satisfy 

(9s n -6s n+1 +s n+2 ) mod2 31 = 0. (83) 



4.1.2 Lagged Fibonacci generator 

Each number is the result of an arithmetic operation (usually addition, sometimes subtraction) 
between two numbers which have occured somewhere earlier in the sequence, not necessarily the 
last two : 

Si = (si_ p + Sj_g) mod m. (84) 

A popular choice is: 

Si = (sj_24 + Si~ 55 ) mod 2 32 . (85) 
It was proposed in 1958 by G.J. Mitchell and D.P. Moore. This generator has a period of 

2 f (2 55 - 1) , (86) 

where < / < 32. 

4.1.3 Shift register generator 

The generalized feedback shift register generator jl4j is based on the recurrence relation 

Si = Si—p Si— p+q, (87) 

where ® deontes the bitwise exclusive-or operation ( 000 = 101 = 0, 0©1 = 100 = 1). One 
choice for lags is given by p = 250 and q = 103. With this choice one has a period of 2 250 — 1. 

4.1.4 RANMAR 

RANMAR E3] is combination of two generators. The first one is a lagged Fibonacci generator, 

n = - r,_3 3 ) mod 2 24 . (88) 

The second part is a simple arithmetic sequence defined by 

/ ti_i - 7654321, if - 7654321 > 0, 

\ U-i - 76 5 4 321 + 2 24 - 3 otherwise. [ ' 

The final random number is then obtained as 

s l = {n - U) mod 2 24 . (90) 
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4.1.5 ACARRY/RCARRY/RANLUX 

Marsaglia, Narasimhan and Zaman fl6| proposed a lagged Fibonacci generator with a carry bit. 
One first computes 

A„ = s n -i - s n -j - c„_i, j > i > 1, (91) 

and sets 

s„ = A n , c„ = 0, if A„ > 0, , > 

s„ = A n + 6, c n = 1 otherwise. 

With the choice j = 24, i = 10 and & = 2 24 this generator is known under the name RCARRY. One 
has observed that this algorithm fails the gap test and discovered correlations between successive 
vectors of j random numbers. To improve the situation Liischer Jl^j proposed to read j numbers, 
and to discard the following p — j ones. With the parameter p — 223 the generator is known as 
RANLUX. 

4.2 Quasi-random numbers 

A major drawback of Monte Carlo integration with pseudo-random numbers is given by the fact 
that the error term scales only as 1/y/N. This is inherent to methods based on random numbers. 
However in Monte Carlo integration the true randomness of the generated numbers is not so much 
relevant. More important is to sample the integration region as uniform as possible. This leads 
to the idea to choose the points deterministically such that to minimize the integration error. If 
the integrand is sufficiently smooth one obtains a deterministic error bound, which will scale like 
N^ 1 \n p (N) for some p. 

We say a sequence of points is uniformly distributed if 

1 N 

Ji^ - !>./(<) = vol(J) (93) 
n=l 

for all subintervals J C [0, 1] . Here vol(J) denotes the volume of J and Xj( x ) the characteristic 
function of J, e.g. xj( x ) = 1 if # £ ■/ and xj( x ) = otherwise. As a measure of how much a finite 
sequence of points x\, X2, xn deviates from the uniform distribution we define the discrepancy 

D = ™P f^E^(^)-vol(J)J, (94) 

where the supremum is taken over a family J of subintervals J of [0, l] d . By specifying the family 
J we obtain two widely used concepts of discrepancy: The extreme discrepancy is obtained from 
a family J , where each subinterval J contains the points x = (ui, Ud) with 

< Ui < u™ ax , i = l,...,d. (95) 

A subinterval J for the extreme discrepancy can therefore be specified by giving the "lower-left- 
corner" x mm = (m™«, ...,it™") and the "upper-right-corner" x max = (u™ ax , uf ax ). The star 
discrepancy is a special case of the extreme discrepancy for which u™ m = ... = tij'™ = 0. The star 
discrepancy is often denoted by D* . One can show that 

D* < D extreme < 2 d D*. (96) 

Exercise 13: Prove this statement. 

Hint: The first inequality is trivial. To prove the second inequality you may start for d = 1 from the fact 

equals the number of points in [0, vJx ax [ minus the number in 
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[0,«i [■ 



If in the definition of the discrepancy (eq. 
obtains the mean square discrepancy. 



1) the supremum norm is replaced by 



one 



Before coming to the main theorem for quasi-Monte Carlo integration, we first have to intro- 
duce the variation V(f) of the function f(x) on the hypercube [0, l] d . We define 



nn = E E v(k) w 

fe=l l<i 1 <t a <...<tj,<d 



(97) 



where V^ k \f) of a function f(ui, ■■■,Uk) depending on k variables U\, 

d k f(u 1 , ...,u k ) 



Uk is given by 



V {k \f) 



l i 
du\... J duk 
<> o 



du\...duk 



(98) 



and V < " k Hf)\i 1 i denotes the restriction of / to the fc-dimensional face defined by (m, G 
[0, l] d and Uj = 1 if j ^ ii, We further require that the partial derivatives in eq. ^8] are 

continous. If V(f) is finite one says that / is of bounded variation on [0, 

We now have the main theorem of quasi-Monte Carlo integration: If / has bounded variation 
on [0, l] d then for any x\, € [0, l] d we have 



1 N f 

n=l J 



<V(f)D*(x u ...,x N ). 



(99) 



Eq. ^ shows that the error will be bounded by the product of the variation of the function times 
the discrepancy of the point set. Since the variation of the function depends on the function / 
alone, but not on the chosen point set x\,..., xn, it is inherent to the problem. The error may be 
reduced by choosing a point set with a low discrepancy. 

Exercise 14: Show that star discrepancy of the one-dimensional point set x n = (2n— 1)/(2N), n — 1, TV, 
is given by D* = 1/(2N). (It can be shown that this is the smallest discrepancy we can get in one dimen- 
sion with a finite set of N points. Note that the construction of the points x n depends on the predefined 
total number of points N. It can be shown that there is no infinite sequence x\, xi, whose first N 
elements have the theoretical minimal discrepancy 1/(2JV). Instead there is a theorem which states that 
for the discrepancy D*(N) of the first N elements of an infinite sequence one has D*(N) > cN -1 \n(N) 
for infinitely many N .) 



There is a general theorem which states that the star discrepancy of a finite point set cannot 
be made smaller than a certain value, which depends on the number of points iV and the dimen- 
sionality d. In practical calculations one is not so much interested in a finite sequence Xi,...,Xn 
which attains the minimal discrepancy for a given value of N, but more in an infinite sequence 
whose first N elements have a small discrepancy. Such a sequence is called a low-discrepancy 
sequence. The advantages are that one can change the value of N in a quasi-Monte Carlo integra- 
tion without loosing the previously calculated function values. Again there is a general theorem 
which states that the discrepancy of such a sequence cannot decrease faster than 1/N for large N. 
In practice one knows explicit sequences which decrease like TV -1 ln p (7V) for some p. We will now 
turn our attention to the construction of such sequences. 
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4.2.1 Richtmyer sequences 

One of the first quasi-random number generators which yielded an infinite sequence in arbitrary 
dimension d was given by Richtmyer. The n-th point x n — (u\, Ud) in the sequence is given by 

Ui = nSi mod 1, i = l,...,d, (100) 

where the Si are constants which should be irrational numbers in order to ensure an infinite period. 
Since truly irrational numbers cannot be represented in computers the standard choice is to take 
Si equal to the square root of the i-th prime number. 

4.2.2 Halton sequences 

Let b > 2 be an integer number. Every integer n > has a unique digit representation in base 6, 

oo 

n = ^ajV, (101) 

j=o 

where a,j E {0, 1, 6—1} and aj — for sufficiently large j, e.g. the sum is actually finite. We 
then define 

oo 

<t> b (n) = J2 a ^ 3 - ( 102 ) 

3=0 

The sequence 0&(1), <fo(2), is called the van der Corput sequence in base b and used as a 
building block for the Halton sequence defined by 

x n = {(t>b 1 (n),...,(f>b d (n)) . (103) 

If one uses the first d prime numbers for the bases bi, bd one can show the star discrepancy of 
the Halton sequence satisfies for all N > 2 

< A j^m +0 (^py (104, 

The coefficient Ad depends only on the first d prime numbers p%, pd and is given by 

d 



Pk - 1 

-l 

One can show that 



l im = 1, (106) 



dlnd 



e.g. Ad grows stronger than exponentially as the number of dimensions increases. This is the major 
drawback of the Halton sequence and limits its applications in quasi-Monte Carlo integration to 
low dimensions. 

4.2.3 Sobol sequences 

Sobol sequences |Tg[ ] are obtained by first choosing a primitive polynomial over Z2 of degree g: 

P = x 9 + a 1 x g - 1 + ... + a g -!X + l, (107) 
where each ai is either or 1. The coefficients a\ are used in the recurrence relation 

Vi = OiUi-i © a 2 Vi-2 © ... © a 9 _iWi_ s+ i © v^ g ® [v^g/2 9 ], (108) 
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where © denotes the bitwise exclusive-or operation and each Vi is a number which can be written 
as Vi = rrii/(2 l ) with < m, < 2*. Consequetive quasi-random numbers are then obtained from 
the relation 



X n +1 



x n ffi v c 



(109) 



where the index c is equal to the place of the rightmost zero-bit in the binary representation of 
n. For example n — 11 has the binary representation 1011 and the rightmost zero-bit is the third 
one. Therefore c = 3 in this case. For a primitive polynomial of degree g we also have to choose 
the first g values for the m,. The only restriction is that the rtii are all odd and mi < 2\ There 
is an alternative way of computing x n : 



givi ffi 32^2 © 



(110) 



where ... (7317231 is the Gray code representation of n. The basic property of the Gray code is that 
the representations for n and n + 1 differ in only one position. The Gray code representation can 
be obtained from the binary representation according to 



■333231 = ...636261 © ...646362- 



Exercise 15: Find the Gray code representation for n = 0, 1, 7. 
Hint: Start from the binary representation and use the formula eg. 
Ill, 101, 100. 



Ill 



(111) 



You should find 0, 1, 11, 10, 110, 



The star discrepancy of the Sobol sequence satisfies 



D* < B d 



\u d N 
N 



O 



ln^" 1 N 
N 



(112) 



where 



and 



B, = 



d\(ln2) 



dlnd dlnd dlnlnc? 
k— — : < r d < - r -^- H ^ h o(dlnlnd) 



In In d 



In 2 



In 2 



(113) 



(114) 



and asymptotically one finds hiBd = O(dlnlnd) and B d increases also stronger than exponentially 
with d. 



4.2.4 Faure sequences 

Let d > 3 and let 6 be the first prime number greater or equal to d. We describe here the 
construction of the Faure sequence x%, X2 7 ■■■ M . Let ut be the components of the n-th point in 
the sequence: x n — (u\, U2, Ud). One starts from the digit representation of n — 1 in the base 6: 

00 

ri-i = J2 a i bj - ( 115 ) 

r=0 

ui is given by 

00 

ui = (p b (n- 1) = ^ajh- 1 -! (116) 
j=o 

and U2, U3, Ud are obtained recursively from 

u l+1 = C{ui), 1 < i < d. (117) 
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If v = Vjb 1 3 and w — ^Wjb 1 ■? then the transformation w — C(v) is defined by 
It can be shown that the star discrepancy of such a sequence satisfies 

D . < ft !^ + o(!^) , 119) 

where 

- a(£i)'- d») 

Asymptotically Cd goes to zero. 
4.2.5 Niederreiter sequences 

Let <i > 3 and let 6 be a prime power greater or equal to d. As before we use the digit representation 
for n in the base 6 



Y^a r b r . (121) 



r=0 



The n-th element x n — (ui, ...,M<j) of the Niederreiter sequence pfl] is given by 



= E 6 "M E ( ill ) mod 6 ] , (122) 

3=1 \r=j— 1 



where the Cj are distinct elements from the set 0, 1, 6—1. Note that the sums are actually finite. 
It can be shown that the star discrepancy of such a sequence satisfies 

V <_ ^ + (^Y ( 12 3, 



where 



Further 



1 6-1 f[b/2]Y 



Dd = ^wm\i^) • (124) 



lim-^- < -1 (125) 
d~>oo dm in a 

and the coefficients decrease stronger than exponentially in the limit d — > oo. 
4.3 Summary on random numbers 

Monte Carlo integration relies on pseudo-random number generators. We reviewed some of the 
most widely used algorithms for the generation of random numbers. Pseudo-random number 
generators might introduce correlations between succesive generated numbers. It is therefore rec- 
ommended to check a Monte Carlo integration by repeating the same calculation with a different 
generator. 
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Quasi-random numbers are deterministic and designed to sample a d-dimensional space as uni- 
form as possible. If the integrand is sufficiently smooth, the error bound is deterministic and will 
decrease like 

A ln d N „fln d - 1 N\ ,, n ^, 
A — + {-^)- < 126 > 

If the integrand has discontinuities (for example due to complicated integration boundaries, which 
have to be embedded into a hypercube) the theorem on the error bound is no longer valid and one 
usually estimates the error term like in Monte Carlo integration with pseudo-random numbers. 
The coefficient A of the leading term of the error estimate depends on the dimension d and goes for 
d — > oo to infinity for the Halton and Sobol sequences, and to zero for the Faure and Niederreiter 
sequences. Very little is known about the subleading terms. Explicit simualtions with roughly 
N = 10 5 points by F. James, J. Hoogland and R. Kleiss |22| show that the quadratic discrepancy 
of these sequences is better than the one from pseudo-random numbers provided d < 12, and 
approaches the latter one in higher dimensions. 

Further reading: The book of Knuth |p"l|| contains an excellent introduction to pseudo-random 
numbers, further I also used the review article by James [[l2| in the preparation of this section. 
The article by I. Vattulainen et al. (ll| contains a comparison of various pseudo-random number 
generators in several tests. 

The section on quasi-random numbers is mainly based on the book by Niederreiter | pfj| . Fox and 
Bratley and Fox pl} | give a numerical implementation of several quasi-random number generators. 

5 Generating samples according to a specified distribution 

Quite often one needs random numbers which are distributed according to a specified probability 
density function p{x). We will denote the cumulative distribution function by 

P{x max ) = J p(x)dx, (127) 

— oo 

with appropriate generalizations in higher dimensions. P(x max ) gives the probability that x < 
Xmax- In the previous section we have seen how to generate pseudo-random or quasi-random 
numbers which are uniformly distributed in the interval [0,1]. The problem can therefore be 
specified as follows: Given a sequence of random numbers, uniformly distributed in [0, 1], find a 
transformation such that the resulting sequence is distributed according to p(x). 

5.1 General algorithms 

5.1.1 The inverse transform method 

We describe the inverse transform method in one dimension. Let a; be a random variable dis- 
tributed with density p{x). The distribution function P(x) takes values in the interval [0, 1]. Let 
u be a random variable uniformly distributed in [0, 1]. We set 

x = P-\u). (128) 

For the differentials we have 

p(x)dx = du. (129) 

In order to use the inverse transform method we have to know the function P _1 (zt). This will not 
always be the case. 

One application of the inverse transform method is importance sampling. 
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5.1.2 Acceptance-rejection method 

The acceptance-rejection method, developed by von Neumann, can be used when an analytic form 
of P(x) is not known. We assume that we can enclose p{x) inside a shape which is C times 
an easily generated distribution h(x). Very often one chooses h(x) to be a uniform distribution 
or a normalized sum of uniform distributions. Since p{x) < Ch(x) and both p{x) and h(x) are 
normalized to unity, one has C > 1. One first generates x according to the distribution h(x) 
and calculates then p(x) and Ch{x). Secondly, one generates a random number u, uniformly 
distributed in [0, 1], and checks uCh(X) < p(x). If this is the case, one accepts x, otherwise one 
rejects x and starts again. The efficiency of this method is 1/C, therefore one tries to choose h(x) 
to be as close to p(x) in shape as possible. 

5.1.3 Applications 

One often encounters the situation that one needs random variables distributed according to a 
Gaussian distribution: 

1 

P (x) = -^ e -^^. (130) 

The Box-Muller algorithm gives a prescription to generate two independent variables x\ and 22, 
distributed according to a Gaussian distribution with mean \i = and variation a 1 = 1 from two 
independent variables u\ and 112, uniformly distributed in [0, 1]: 

xi = \f— 2 In Mi cos(27rct2), 

X2 = y— 2 lnwi sin(27r?/2). (131) 

Exercise 16: Prove this statement. 
Hint: You should show that 

du\du2 — —e~ 2 dx\dx2 (132) 
and that x\ and X2 are independent given the fact that u\ and 112 are independent. 

Algoritms for generating many different distributions are known and can be found in p7| . We 
have collected some "cooking recipes" in the appendix B. 

5.2 The Metropolis algorithm 

In practical application one often wants to generate random variables according to some probability 
density p{x\ 1 ...,Xd), which not necesarrily factorizes. The methods described so far are in most 
cases insufficient. In practice one often uses the Metropolis algorithm [^3| for generating random 
samples which are distributed according to a multivariate probability density p(x\, Xd) where d 
is large. Let us call the vector <f> — (xi, Xd) a state of the ensemble, which we want to generate. 
Within the Metropolis algorithm one starts from a state 4>q, and replaces iteratively an old state 
by a new one, in such a way, that the correct probability density distribution is obtained in the 
limit of a large number of such iterations. The equilibrium distribution is reached, regardless of 
the state one started with. Once the equilibrium distribution is reached, repeated application of 
the algorithm keeps one in the same ensemble. In short, the desired distribution is the unique fix 
point of the algorithm. Two important conditions have to be met for the Metropolis algorithm to 
work: Ergodicity and detailed balance. Detailed balance states that the transition probabilities 
W((j)i — > 4*2) and W{4>2 ~ > 4>i) obey 

p(0i)W(0i - fa) = pifoWifo^fa). (133) 

Ergodicity requires that each state can be reached from any other state within a finite number of 
steps. Given a state <f>i, one iteration of the Metropolis algoritm consists of the following steps: 
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1. Generate (randomly) a new candidate <j)' . 

2. Calculate AS = -ln.(p(0')/j>(0i)). 

3. If AS 1 < set the new state fa = </>'■ 

4. If AS > accept the new candidate only with probability p{(j>') / p{4>) , otherwise retain the 
old state fa . 

5. Do the next iteration. 

Step 3 and 4 can be summarized that the probability of accepting the candidate fa is given by 
W(fa —> fa) — min(l,e s ). It can be verified that this transition probabilty satisfies detailed 
balance. The way how a new candidate fa suggested is arbitrary, restricted only by the condition 
that one has to be able to reach each state within a finite number of steps. 

The Metropolis algorithm also has several drawbacks: Since one can start from an arbitrary 
state it takes a number r f of steps to reach the desired equilibrium distribution. It can also be 
possible that one reaches a metastable state, after which one would get out only after a large 
(and generally unknown) number of steps. One therefore starts the Metropolis algorithm with 
several different initializations and monitors if one reaches the same equilibrium state. Further- 
more, once the equilibrium is reached, succesive states are in general highly correlated. If one is 
interested in an unbiased sample of states fa, one therefore generally discards a certain number 
Td of states produced by the Metropolis algorithm, before picking out the next one. The number 
of time steps Td is called the decorrelation time and is of order Td = £ 2 , where £ is a typical 
correlation lenght of the system. This is due to the fact that the Metropolis algorithm updates 
the states locally at random. One therefore performs a random walk through configuration space, 
and it takes therefore £ 2 steps to move a distance £ ahead. In certain applications for critical phe- 
nomena the correlation lenght £ becomes large, which makes it difficult to obtain unbiased samples. 

Since new candidates in the Metropolis algorithm are picked out randomly, the "random-walk- 
problem" is the origin of inefficiencies to obtain the equilibrium distribution or to sample uncorre- 
cted events. One way to improve the situation is to use a-priori probabilities |2j|. The transistion 
probability is written as 



where A(fa — > fa ) is the a-priori probability that state fa is suggested, given the fact that the 
system is in state fa. The probability of accepting the new candidate is now given by 



The bias we introduced in suggesting new candidates is now corrected by the acceptance rate. 
By choosing the a-priori probabilities carefully, one can decrease the rejection rate and improve 
the efficiency of the simulation. As an example consider the case where a state <j> — (x\, ...,%d) 
describes a dynamical system, like a molecular gas. In this case a discretized or approximative 
version of the equations of motion can be used to suggest new candidates and the Metropolis 
algorithm might converge faster. The combination of molecular dynamics (e.g. the use of the 
equations of motion) with Monte Carlo methods is sometimes called hybrid Monte Carlo. 

5.2.1 Numerical simulations of spin glasses 

The Ising and the Potts model are widely used in statistical physics. Both describe an ensemble 
of spins interacting with each other through next-neighbour interactions. The Hamiltonian of the 
Ising model is given by 



W(fa - fa) 



A{<t> x fa)W(fa -> fa). 



(134) 




(135) 



s/ ng 




(136) 



(id) 
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where J > is a constant, denotes the sum over all next-neighbours and the spin Si at site 
i takes the values ±1. The Hamiltonian of the Potts model is given by 



Hp 



(id) 



1 



(137) 



Here each spin 5j may take the values 1, 2, q and <5 a h denotes the Kronecker delta. We 
have also allowed the possibility of different couplings Jy depending on the sites i and j. Let 
(f> = (Si, S2, Sn) denote a state of the model. Observables are given by 



(O) 



-H(4,)/kT 



-H(<j,)/kT 



(138) 



where the sum runs over all possible states. One observable is the magnetization M or the order 
parameter. For the Ising model the magnetization is given by 



Mi 



sing 



and for the Potts model by 



M 



Potts 



q max(iV Q ) — 1 
9 = 1 : 



(139) 



(140) 



where N a is the number of spins with value a. A second observable is the magnetic susceptibility 
given by 



X 



(Mf 



(141) 



Exercise 17: Write a Monte Carlo program which simulates the two-dimensional Ising model on a 16 x 16 
lattice with periodic boundary conditions using the Metropolis algorithm. (Note that with a 16 x 16 lattice, 
the total number of states is 2 256 . This is far too large to evaluate the partition sum by brute force.) You 
may initialize the model with all spins up (cold start) or with a random distribution of spins (hot start). 
Step through the 256 spins one at a time making an attempt to flip the current spin. Plot the absolute 
value of the magnetization for various values of K = J/kT between and 1. Is anything happening around 
K = 0.44 ? 



To simulate the Potts model it is more efficient to use the Swendsen-Wang algorithm (28) which 
can flip clusters of spins in one step, instead of the standard Metropolis algorithm, which has to 
build up a spin flip of a cluster through succesive steps. The Swendsen-Wang algorithm introduces 
for each neighbouring pair i and j a bond mj between the two spins. The bond variable mj can 
take the values and 1, the last value signifies that the two spins are "frozen" together. One 
iteration of the Swendsen-Wang algorithm works as follows: 

1. For each neighbouring pair i, j the spins are frozen with probability 

Pii = l-exp^-^^. (142) 

2. Having gone over all pairs, clusters are now formed in a second step: A cluster contains all 
spins which have a path of frozen bonds connecting them. Note that by constrution all spins 
in one cluster have the same value. 

3. For each cluster a new value for the cluster spin is chosen randomly and with equal proba- 
bility. 
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The reason why this algorithm works is roughly as follows: The Swendsen-Wang method enlarges 
the set of variables from the 5, to the set S, and mj. Further, the partition function of the Potts 
model can be rewritten as 

Z = exp(-H/kT) 

{s.} 

= E E II ((! - P ^ 5 ^ + Pi^,nJs iSi ) ■ (143) 
{s,)K1(.j) 

Integrating out the mj (e.g. marginalization with respect to the bonds Hy) one recovers the Potts 
model. 

5.2.2 Numerical simulations of quantum field theories 

Observables in quantum field theories are given by 

<o) = I^°(^- SW (144 ) 

where S((f>) is the Euclidean action of the field fa The basic idea of lattice field theory is to approx- 
imate the infinite dimensional path integral by a finite sum of field configurations fa. Since the 
factor e~ s ^ can vary over several order of magnitudes, simple random sampling will yields poor 
results, and one therefore uses importance sampling and chooses <j) according to some propability 
P{4>): 

N 

J2 0(fa)P-\fa)e- s ^ 

(O) « ^ . (145) 

£ P-i^e-W*) 
j=i 

The action S(fa) is also approximated by a finite sum. For example the discretized version of the 
action of </> 4 -theory 

S = J d d xf^-{d^)(d^) + \m 2 4> 2 + ^ (146) 
is given in dimensionless quantities by 

s(fa) = ^2 (e m*) W x + a A) + ^ x - a ^))) + <^) 2 + w*) 4 ^ > ( 14? ) 

where a is the lattice spacing and fi is a unit vector in the direction fi. The field has been rescalcd 
according to 



2Ka 1 - d l 2 <p dlscr (148) 

The parameters k and A are related up to correction of order a 2 to the original parameters m 2 
and g by 



m 2 a 2 = --2d, g a 4 - d =^. (149) 

K K 



The perfect choice for P{fa) would be P eq (fa) = e~ s ^*\ but this would require that we know 
the partition function analytically. One is therefore forced to construct a random walk through 
configuration space such that 

lim P{fa) = P eq {fa). (150) 
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This is usually done with the help of the Metropolis algorithm. One first chooses a random change 
in the field configuration <f>i — > tf>' i} calculates the change in the action AS — S{4>' i ) — S(4>) and 
the Metropolis transition probabilty W — min(l, exp(— AS*)). One then throws a random number 
u uniformly distributed in [0, 1] and accepts the new configuration if u < W, otherwise is 
rejected. 



5.3 Generating phase space for particle collisions 

Observables in high-energy collider experiments are often of the following form 

f M 

O = J d$ n (p a +Pb,Pl,...,Pn) 8K ( s j @(0,Pl,--,Pn), (151) 

where p a and p& are the momenta of the incoming particles, the outgoing particles are denoted by 
the labels l,...,n. The Lorentz-invariant phase space is denoted by d<f> n , 1/(8K (s)) is a kinematical 
factor with s = (p n + Pb) 2 which includes the averaging over initial spins (we assumed two spin 
states for each initial particle), M is the relevant matrix element squared and 0(0, pi, ...,p n ) is a 
function which defines the observable and includes all experimental cuts. The situation described 
above corresponds to electron-positron annihilation. If hadrons appear in the initial state there 
are slight modifications. The exact matrix element squared is often impossible to calculate. Event 



generators like HERWIG 31 or PYTHIA |32| approximate the matrix element squared through 
a three-stage process: First there is a perturbative hard matrix element for the hard subprocess, 
where only a few partons are involved. Secondly, the partons originating from the hard scattering 
process are then allowed to radiate off additional partons. This stage is usually called parton 
showering and results in a cascade of partons. Finally the resulting partons are converted into 
observable hadrons with the help of a phenomenological model. Perturbative calculations follow a 
different philosophy: First, they restrict the set of observables to ones which are infrared safe, e.g. 
which can reliable be calculated in perturbation theory. Secondly, they are based on parton-hadron 
duality and calculate an observable in the parton picture. This amounts to say, that one assumes 
that hadronization corrections are small. The matrix element squared M. is then calculated order 
by order in perturbation theory. The leading order term can be more or less obtained from 
automized procedures, the standard of today is a next-to-leading order calculation, the frontier of 
tomorrow are next-to-next-to-leading order calculations. Bot in perturbative calculations and in 
event generators the problem arises to generate the four-momenta of the outgoing particles. The 
Lorentz-invariant phase space d$„ for n particles with momenta pi, p n and masses mi, m n 
is given by 

n ,4 / n \ 

<m>„(p,p„..,pJ = n^e(rf)%?-m?)(2») 4 Wp-y>) 

- n<4fe (2 * ,v ( p -|>)- (152) 

The phase space volume for massless particles mi = ... = m n = is 



4„ = / d$ 



< 2 -» 4 - 3 "(5) " mfcrr (153 » 



The phase space factorizes according to 



d$ n (P,pi,...,p„) = ^-dQ 2 d<f> j {Q, Pl ,...,p J )d<I> n - j+1 (P,Q,p : i +1 ,...,p n ), (154) 



3 

where Q = Pi 

i+l 
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5.3.1 Sequential approach 

One possibility to g enerate the n-particle phase space is to consider sequentially two-body decays 
p3| . Using eq. 154 one arrives at 

rf$„ = ^ )n _ 2 dM 2 l _ l ...dM 2 d$> 2 {n)...d<S> 2 (2) (155) 

i 

with M 2 = qf, qi = ^2 p% and ^$2(2) = d<f>2(qi,qi-i,Pi)- The allowed region for the invariant 

3=1 

masses Mi is given by (mi + ... + mi) 2 < M 2 < (Mj+i — m^+i) 2 . In the rest frame of qi the 
two-particle phase space d$ 2 (qi, Qi-iiPi) is given by 



d$> 2 (qi,qi-i,pi) = -—^ —2 dcpid(cos6i), (156) 



(2^ 

where the triangle function X(x, y, z) is defined by 

X(x,y,z) = x 2 + y 2 + z 2 - 2xy - 2yz - 2zx. (157) 
This suggests the following algorithm: 

1. Set i — n, qi — P and Mi = y/qj. 

2. Transform to the rest frame of qi. 

3. Use two random variables un, Ui2 and set ipi — 2irun, cos9i = Ui 2 . 

4. If i > 3 use a third random variable and set -Mj_i = (mi + ... + rrii—i) + Uis(Mi — rrii), 
otherwise if i = 2 set M\ = m\. 

5. Set 

J\(M 2 ,Ml l7 m 2 ) 

K'l = 2^ < 158 > 

and pi ' = |ft '| • (sin^j sin</?j, sin^j cosyjj, cos^j). Set further 

Pi = (y/\Pi'\ 2 +mlpi'), <&-i = (y/m , \ 2 +M?_ 1 ,-jl i '). (159) 

6. Transform back to the original Lorentz system. 

7. Decrease i = i — 1. If i > 2 go back to step 2, otherwise set pi = qi. 
The weight of the generated event depends on the event and is given by 

1 " JxiMf^Mf^m 2 ) 

- ^ W-'-^I T m, • (160) 

Exercise 18: Lorentz transformation. Let q be a timelike fourvector with q 2 = M 2 . Denote by S the 
coordinate system in which q is given, and by S' the rest frame of q, e.g. in S' we have q' = (M, 0,0,0). 
Let p be an arbitrary fourvector in S and denote the corresponding representation in S' by p' . Show that 
p and p' are related by 

Pt = l(p't + ^f), P = ^' + «((7-l)|jf +7^), (161) 
where 7 = qt/M. Show also that the inverse transformation is given by 

Pt = 7 (pt ~ , p' =P + g^(7- ■ ( 162 ) 
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5.3.2 Democratic approach 

Where as the sequential algorithm discussed above generates an event with a weight depending 
on the generated fourvectors, it is sometimes desirable to have an algorithm which sweeps out 
phase space (almost) uniformly. The RAMBO-algorithm (34) maps a hypercube [0, l] 4n of random 
numbers into n physical four-momenta with center-of-mass energy V P 2 . Massless fourvectors can 
be generated with uniform weight, and we discuss this case first. 

Let P = (P, 0, 0, 0) be a time-like four-vector. The phase space volume for a system of n massless 
particles with center-of-mass energy \fP 2 is 

To derive the RAMBO-algorithm one starts instead from the quantity 

(oo \ ™ 

jxf(x)dxj . (164) 

The quantity R n can be interpreted as describing a system of n massless four-momenta that are 
not constrained by momentum conservation but occur with some weight function / which keeps 
the total volume finite. The four-vectors arc then related to the physical four-momenta p^ by 
the following Lorentz and scaling transformations: 



where 



(79° + £•&), Pi = x (qi + hi +a(b- tj , (165) 

n 1 

Q"=5>f, M = y/O*, b=- — Q, 

i=l 

7= — = Jl + b 2 : a= , x = - . (166) 

' M v ' 1 + 7 M K ' 



Denote this transformation and its inverse as follows 



P ? = xH£( qi ), q^-H'Jpi). (167) 



By a change of variables one can show 

■(n/(;«° S ta)))^«- 

With the choice f(x) = e~ x the integrals over b and x may be performed and one obtains 

Rn = ®n-S n (169) 



(168) 



with 



rfxa-n* v J)r(n-l)r(2n) 



S n = 2n(Py- n (170) 

This gives a Monte Carlo algorithm which generates massless four-momenta p^ according to the 
phase-space measure (|163|). The algorithm consists of two steps: 
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1. Generate independently n massless four- momenta q? with isotropic angular distribution and 
energies qf distributed according to the density q®e~ qi dqf. Using An random numbers Ui 
uniformly distributed in [0, 1] this is done as follows: 

Ci = 2u H - 1, ifi = 2nui 2 , q° = - ln(w l3 it l4 ), 

it = 4\j^-c 2 t COSip u qf = qVy/l-cf sin<^, qf = q°c t . (171) 

2. The four-vectors gf are then transformed into the four-vectors , using the transformation 

dH). 

Each event has the uniform weight 

»0 - (2 ,)«"(f)"- 1 F |2rl_. (172) 

Phase-space configurations corresponding to massive particles can be generated by starting from a 
massless configuration and then transforming this configuration into one with the desired masses. 
This is done as follows : Let p% be a set of massless momenta. One starts again from the phase- 
space integral for massless particles 

. n ,4 / n \ 

The are transformed into the four-momenta k¥ as follows : 



$ = +eip\)\ *i = ®, (174) 

where £ is a solution of the equation 



i=i 

It should be noted that in the general case no analytic expression for £ exists and £ has to be 
computed numerically. After some manipulations one arrives at 



MM) 



/n ,4, / n \ 

n j^f e ^ k2 i - ™* )( 2 -) 4 * 4 ( p - e fc J • ^(w. «)> 



(176) 

where the weight is given by 



M'r, 



( ^(S l6l )"(n|)gf) _ ' 



In contrast to the massless case, the weight is no longer constant but varies over phase space. 

To generate events with massive particles in the final state one therefore proceeds as follows: 
1. Generate an event with n massless particles. 



2. Solve eq. 175 numerically. 



3. Use eq. 174 to obtain the momenta of the massive particles. 
The weight of such an event is then given by 

w = w m w (178) 



with wq and w m defined in eq. 172 and eq. 177, respectively. 
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5.3.3 Generating configurations close to soft or collinear regions 

In massless QCD individual terms in the leading order squared matrix element are of the form 

Tfl 

M = . (179) 

Si 1 i 2 Si 2 i 3 ---Si k _ l i k 

Integration over phase space leads to collinear singularities when one of the s^- approaches zero, 
and to soft singularities when two adjacent invariants Sj._ 1 i j and Si j Si j+1 approach zero simul- 
taneously. Event generators use therefore a cutoff s m , n to avoid this region. In perturbative 
calculations these singularities cancel against similar divergences encountered in loop integrals, 
but the cancelation occurs only after the phase space integration has been performed. On the 
other hand the phase space integrations have to be done numerically. Two approaches exist to 
handle this situation: phase space slicing and the subtraction method. Within the slicing ap- 
proach the phase space is split into a region s > s m i n , where numerical integration is performed, 
and a small region s < s m i n , where after approximation of the integrand one integration can 
be performed analytically, and the resulting singularities cancel then against the corresponding 
ones in the loop amplitudes. Within the subtraction method one adds and subtracts a suitable 
chosen term, such that each integration (e.g. the integration over the real emission part and the 
integration over the virtual correction) is finite. Nevertheless it will also be useful within the sub- 
traction method to split the integration in the real emission part in two pieces in order to improve 
the convergence of the Monte Carlo integration: One region s < s m i n which will give a small or 
negligible contribution and a region s > s m i n . 



In short in any case we are dealing with integrals of the form J dsf(s)/s, with lower bound- 
ary s m i n . To improve the efficiency of the Monte Carlo integration it is desirable to remap the 
1/s-behaviour of the integrand. In the simple example above this could be done by a change of 
variables y = Ins. In real problems there is more than one invariant s and the remapping is done 
by relating a (n + l)-parton configuration with one soft or collinear parton such that s as s s b is the 
smallest product of all adjacent products to a "hard" n-parton configuration |35|. In the region 
where s as s s b is the smallest product, we remap the phase space as follows: Let k' a , k s and k' b be 
the corresponding momenta such that s as = (k' a + k s ) 2 , s s b — (k' b + k s ) 2 and s a b = (k' a + k s + k' b ) 2 . 
We want to relate this (n + 1) particle configuration to a nearby "hard" n-particle configuration 
with (k a + kb) 2 = (k' a + k s + k' b ) 2 , where k a and fc& are the corresponding "hard" momenta. Using 
the factorization of the phase space, we have 

dK 2 

d$ n+1 = d^ n . 1 —-d^ 3 (K,k' a ,k s ,k' b ). (180) 
Ztt 

The three-particle phase space is given by 

d$ 3 (K, k' a ,k s ,k' b ) = 1 — ds as ds sb dn' b d(j) s 

32(27r) J s a fc 



= T3 ds as ds sb d(f> s d$2(K,ka,k b ) (181) 

4^Z7Tj S a b 

and therefore 

« ,-f, ds as ds s bd(j) s 

= d$ " 4(2^^ ■ < 182 ) 

The region of integration for s as and s s b is s as > s m i ni s s i, > s m i n (since we want to restrict the 
integration to the region where the invariants are larger than s m i n ) and s as + s s {, < s ab (Dalitz 
plot for massless particles). It is desirable to absorb poles in s as and s s b into the measure. A naive 
numerical integration of these poles without any remapping results in a poor accuracy. This is 
done by changing the variables according to 

Sas = Sab , S sb = S a b , (183) 

V S a b J V Sab J 
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where < u\, u 2 < 1. Note that u\, u 2 > enforces s as , s sb > s m i n . Therefore this transformation 
of variables may only be applied to invariants Sij where the region < Sij < s m i n is cut out. The 
phase space measure becomes 

d$ n+ i = d$ n 1 SasSsb m 2 ( fl^) Q(s as + s sb < s ab )du 1 du 2 d(j) s . (184) 
4{Zir) a s ab \ S a b J 

This give the following algorithm for generating a (n + l)-parton configuration: 

1. Take a "hard" n-parton configuration and pick out two momenta k a and k b . Use three 
uniformly distributed random number ui,u 2 ,u 3 and set 



s ab — {k a ~\~ fcfc) > 

Ml 

Sas = Sab 



Sab 



s 1 U2 

Or. 



Ssb — $ab 

Sab 

<j> 8 = 2iru 3 . (185) 

2. If (s as + s s b) > s a b, reject the event. 

3. If not, solve for k' a , k' b and k s . If s as < s s b we want to have k' b — > kb as s as — > 0. Define 

^ 5 ab s sb 7-^ S as -\- S s b S a b S as . , 

E a = , E s = , E b = , (186) 

6» ah = arccos ^1 - Sab 2J ^ & - 0*6 = arccos ^1 - T^r^r) ■ ( 187 ) 

It is convenient to work in a coordinate system which is obtained by a Lorentz transformation 
to the center of mass of k a + kb and a rotation such that k' b is along the positive z-axis. In 
that coordinate system 

p' a = E a (l,sm6 ab cos(4> s +7r),sm6> afc sin(0 s + 7r), cos 6> ab ), 
Ps = E s (l,sm6 sb cos(j) s ,sm6 sb sm<f> s ,cos6 sb ), 

Pb = E b (l, 0,0,1). (188) 

The momenta p' a , p s and p' b are related to the momenta k' a , k s and k' b by a sequence of 
Lorentz transformations back to the original frame 

k' a = AboostAxy(4>)kxz(0)p a (189) 

and analogous for the other two momenta. The explicit formulae for the Lorentz transfor- 
mations are obtained as follows : 



Denote by K = y 7 (k a + k b ) 2 and by p b the coordinates of the hard momentum kb in the 
center of mass system of k a + kb. Pb is given by 

(Q° ,0 h ■ Q - / k b .Q k° b \ J\ 
Pb = {-K kb -— kb+ l K(Q° + K) -K) Q ) (190) 

with Q = k a + k b . The angles are then given by 

(, _ Pb-P' b \ 

pi, 



arccos 



arctan I ^| ) . (191) 
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The explicit form of the rotations is 



A, z (0) 



A xy(</>) = 



/ 1 










\ 





cos 8 


sin 


(9 










1 






V o 


— sin 6 


cos 


6 


/ 













\ 





COS (j> 


— sin<^> 










sin <j) 


cos (j) 







V o 








1 


) 



The boost k! 




with Q = k a + fcb and _ftT = \J (k a + k ) 2 . 

4. If s as > s s b, exchange a and b in the formulae above. 

5. The "soft" event has then the weight 

7f 1 Sas^sb 



2 (2tt) 3 s ab 



In' 



Sab 



(192) 



(193) 



(194) 



where w n is the weight of the original "hard" event. 



5.4 Summary on generating specific samples 

In this section we presented some methods to generate samples according to a specified distribu- 
tion. If the inverse of the cumulative distribution function is known, the inverse transform method 
can be used, otherwise one often relies on the acceptance-rejection method. For high-dimensional 
problems the Metropolis algorithm is a popular choice. It is often applied to spin glasses and to 
lattice field theories. 



In pcrturbative calculations or event generators for high energy physics one often wants to sample 
the phase space of the outgoing particles. We reviewed the standard sequential approach, the 
democratice approach and a method for generating soft and collinear particles. The sequential 
approach has an advantage if peaks due to massive particles occur in intermediate stages of the 
process. By choosing the Mf cleverly, VEGAS can take the invariant mass of the resonance along 
one axis and easily adapt to the peak. The democratic approach has the advantage that each event 
has a uniform weight and has its application in massless theories. The method for generating soft 
and collinear particles is, as the name already indicates, designed to generate efficiently phase 
space configurations, where one particle is soft or collinear. This is important for example in NLO 
calculations where otherwise most computer time is spent in calculating the contributions from 
the real emission part. 

Further reading: The lecture notes by W. Krauth |^5| and A.D. Sokal [^6) deal with applica- 
tions of Monte Carlo methods in statistical physics and are worth reading. Further there are 
review articles on the Metropolis algorithm by G. Bhanot H| and on the Potts model by F.Y. Wu 
p7l . There are many introductions to lattice field theory, I borrowed the ideas from the review 
article by R.D. Kenway The book by Byckling and Kajantie |53| is the standard book on 

phase space for final state particles. 

Exercise 19: Application of the Potts model to data clustering. This technique is due to M. Blatt, S. 



37 



Wiseman and E. Domany \29J . Given a set of data (for example as points in a d-dimensional vector 
space), cluster analysis tries to group this data into clusters, such that the members in each cluster are 
more similar to each other than to members of different clusters. An example is an measurement of four 
different quantities from 150 Iris flowers. (In this specific example it was known that each flower belonged 
to exactly one subspecies of the Iris: either Iris Setosa, Iris Versicolor or Iris Virginica. The four quanti- 
ties which were measured are the petal length, the petal width, the sepal length and the sepal width.) Given 
the data points alone, the task is to decide which flowers belong to the same subspecies.) M. Blatt et al. 
suggested to use the Potts model for this problem as follows: Given N points x% in a d-dimensional vector 
space one chooses a value q for the number of different spin states and a model for the spin-spin couplings 
Jij = fidij), where dij is the distance between the points Xi and Xj and f a function depending on the dis- 
tance dij. To minimize computer time one usually chooses /(dy) = for dij > d cut . One then performs a 
Monte Carlo simulation of the corresponding Potts model. At low temperature all spins are aligned, where 
as at high temperature the orientation of the spins is random. In between there is a super-paramagnetic 
phase, where spins are aligned inside "grains", but the orientation of different grains is random. Us- 
ing the Swendsen- Wang algorithm one first tries to find this super-paramagnetic phase by monitoring the 
magnetic susceptibility (there should be a strong peak in \ a ^ the ferromagnetic- superparamagnetic phase 
transition, furthermore, at higher temperature there will be a significant drop in the susceptibility where 
the alignment of the spins inside a grain break up.) Having found the right temperature one then calculates 
the probability that two neighbouring sides are in one Swendsen- Wang cluster. If this probability is greater 
than 1/2 they are assigned to the same data cluster. The final result should depend only mildly on the 
exact value of q and the functional form of f (dij). This can be verified by performing the simulation with 
different choices for these parameters. Write a Monte Carlo simulation which clusters the data according 
to the algorithm outlined above. You can find the data for the Iris flowers on the web, for example at 
http: / '/ www. math.uah. edu / stat/data / Fisher, htm . 



A Orthogonal polynomials 

We list here some properties of the most common known orthogonal polynomials (Legendre, 
Tschebyscheff, Gegenbauer, Jacobi, Laguerre and Hermite). For each set we state the standard 
normalization and the differential equation to which the polynomials are solutions. We further give 
the explicit expression, the recurrence relation as well as the generating function and Rodrigues' 
formula. The information is taken from the book by A. Erdelyi |3^| . 

A.l Legendre polynomials 

The Legendre polynomials P n (x) are defined on the interval [—1, 1] with weight function w(x) = 1. 
The standard normalization is 

l 

2 

(lxP n (x)P m (x) = 2n T ^ nm- (195) 
-1 

The Legendre polynomials are solutions of the differential equation 

(l-x 2 )y" -2xy' + n(n + l)y = 0. (196) 
The explicit expression is given by 

Pn (x) = ^E(-ir(^)( 2 "; 2m )-- 2m , (197) 

where [n/2] denotes the largest integer smaller or equal to n/2. They can also be obtained through 
the recurrence relation: 

P {x) = l, Pi{x)=x, 

(n + l)P n+1 (x) = (2n + l)xP n {x) - nP„_i(s). (198) 
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Alternatively one may use the generating function 

1 



J2 p n(x)z", -1<x<1, \z\<l, (199) 



VI - 2xz + z 2 n=0 
or Rodrigues' formula : 

PJx) = —,^—{x 2 -l) n . (200) 
A. 2 Tschebyscheff polynomials 

The Tschebyscheff polynomials of the first kind T n (x) arc defined on the interval [—1, f] with 
the weight function w{x) = 1/Vl — x 2 . They can be viewed as a special case of the Gegenbauer 
polynomials. Due to their special normalization we discuss them separately. They are normalized 
as 

l 

dx(l - x 2 r-^T n (x)T m (x) = |^ 2 ' n n to. ( 201 ) 

-l 

The Tschebyscheff polynomials are solutions of the differential equation 

(1 - x 2 )y" - xy' + n 2 y = 0. (202) 

The explicit expression reads 

[n/2] 

m=0 v ; 



The recurrence relation is given by 



T {x) = l, T 1 (x)=x, 

T n+1 (x) - 2xT n (x) - T n _i(x). (204) 



The generating function is 

f — xz 



l-2xz + z 2 

n=0 



J2 T n(x)z n , -1<x<1, \z\<l. (205) 



Rodrigues' formula reads 

(-!)»(! -x 2 ) 1 / 2 ^ g U ^-1/2 
n{) ~ 2«+ir(n+i) dx" [l X > 



(206) 



A. 3 Gegenbauer polynomials 

The Gegenbauer polynomials C%(x) are defined on the interval [—1,1] with the weight function 
w{x) = (1 — x 2 )^ 1 / 2 for (j> > —1/2. The standard normalization is 



l 



fdx(l-x 2 r- 1/2 ^(x)C^x) = ^^Ms nm . (207) 

The Gegenbauer polynomials are solutions of the differential equation 

(1-xV -(2n+l)xy' + n(n + 2fi)y = 0. (208) 
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The explicit expression reads 

c;w . JL'g ( _ ir £te) (2l) ^, 

1 (Ul m\(n — 2m)\ 

The recurrence relation is given by 

Cg(x) = l, C?(x) = 2/ix, 

(n + l)C£ +1 (x) = 2(n + n)xC^[x) - (n + 2 M - l)^^). (210) 
The generating function is 

1 oo 

(1 _ 2xz + z2r = E^(*> n > -1<*<L N<!' (211) 

^ ' n— 

Rodrigues' formula reads 



r , M (-l)"2"n!r( 2/ ,)r( M + n+l) g _ n+M _ 1/2 

nW r( M + i)r(n + 2 M )(i-x 2 )A-i/2^« ^ x ^ 



(212) 



Special cases : For /i = 1/2 the Gegenbauer polynomials reduce to the Legendre polynomials, e.g 

Cn /2 (x) =P n ( X ). 

The polynomials C„(x) are called Tschebyscheff polynomials of the second kind and denoted 
by U„{x). 

The case ji = corresponds to Tschebyscheff polynomials of the first kind. They cannot be 



normalized accoring to eq. 207 and have been treated separately above. 



A. 4 Jacobi polynomials 

The Jacobi polynomials Pn^\%) are defined on the interval [—1,1] with the weight function 
w(x) = (1 — x) a (l + x) 13 for a, (3 > —1. The standard normalization is given by 

(2n + a + (3 + l)n\l [n + a + p + 1J 

-l 

(213) 

Differential equation: 

(1 - x 2 )y" + (/3-a-(a + (3 + 2)x)y' + n(n + a + (3 + l)y = (214) 
Explicit expression: 

pt p \x) = ^ib( n t a )(n-i) (x - iyi ~ m ( x+i)m (2i5) 

m—0 

Recurrence relation : 



P^\x) = 1, P[^\x) = (l + ±(a + pfjx + ^(a-P), 

2 (n + 1) (n + a + /3 + 1) (2n + a + (3) P^\x) = 

= (2n + a + 13 + 1) ((a 2 - (3 2 ) + (2n + a + (3 + 2) (2n + a + (3) x) P^\x) 

-2 (n + a) (n + /?) (2n + a + (3 + 2) p££\x). (216) 
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Generating function: 

oo 

R-^l -z + R)- a (l + z + R)- fi = 2- a - l3 P^ f:) \x)z r ' 



n=0 



R= ^l-2xz + z 2 , -l<x<l, \z\<l. (217) 

Rodrigues' formula: 

( — 1 ) n d n 

P^\x) = - -- 1 ±- -,3— ({l-x) n+a (l + x) n+ P) (218) 

v ; 2 n n\{l- x) a {l + xY dx n u ) \ ) ) \ > 

In the case a = [3 the Jacobi polynomials reduce to the Gegenbauer polynomials with /j, = a + 1/2. 
The exact relation is 

(ajQ) _ r(2tt + l) Tja + l + n) a+ i 
Fn (X) ~ T(2a + l + n) T(a + 1) °" W ^ 

A. 5 Generalized Laguerre polynomials 

The generalized Laguerre polynomials (x) are defined on the interval [0, oo] with the weight 
function w(x) = x a e~ x for a > — 1. The standard normalization is given by 



J dxx a e- x Li a \x)L^(x) = T{n+ n °; + 1) S nm . 
o 

Differential equation: 



(220) 



xy" + {a + I - x)y' + ny = (221) 

Explicit expression: 

Recurrence relation: 

4 Q) (aO = l, L { i\ x ) = l + a-x, 

(n + l)L^(x) = ((2n + a + l)-x) L™(x) (n + o^L^x). (223) 
Generating function: 



(l-z)- Q - 1 cxp(^— T J = J2L^(x)z n , \z\<l. (224) 

Rodrigues' formula: 



n=0 



1 d n 

L^\x) = — (x n+a e- x ) (225) 

™ v ' n\x a e- x dx n v ' y ' 

Special cases : The polynomials L„\x) are the Laguerre polynomials and denoted by L n (x). 
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A. 6 Hermite polynomials 

The Hermite polynomials H n {x) are denned on the interval [—00,00] with the weight function 
w(x) — e~ x . The Hermite polynomials appear in the quantum mechanical harmonic oscillator. 
The standard normalization is given by 

dxe~ x2 H n {x)H m {x) = 2 n ^n\5 nm . (226) 

—00 

Differential equation: 

y"-2xy' + 2ny = (227) 

The explicit expression is given by 

H M = „! E (- 1 )"-L_L-^. ,228) 

m=0 v ' 



Recurrence relation: 



H (x) = l, H 1 (x) = 2x, 

H n+1 (x) = 2xH n (x) - 2nH n -i{x). (229) 



Generating function: 



e" 4 ' 



OO 1 

+2 X t = Y —H n {x)t n (230) 



n 

n=0 



Rodrigues' formula: 



H n (x) = (-l)"e* 2 £^e-* 2 (231) 



B Sampling some specific distriubtions 

We list here some cooking recipes how to generate samples according to some special distributions 
(Gaussian, % 2 , binomial, Poisson, gamma, beta and Student t). The algorithms are taken from 
Ahrens and Dieter |}7) and the particle data group J38|. 

B.l Gaussian distribution 

A random variable distributed according to the probability density function 

p(x) = —L^e'^ (232) 
V 27rcH 

is called normal or Gaussian distributed. As already mentioned in sect. 5.1.3] the Box-Muller 
algorithm allows one to generate two independent variables x\ and X2, distributed according to a 
Gaussian distribution with mean — and variation a 2 = 1 from two independent variables u\ 
and U2, uniformly distributed in [0, 1] as follows: 



x\ = \J —2 lniti cos(27rtt2), 

X2 = y—2 lniti sin(27i"U2)- (233) 
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B.2 ^-distribution 

If x%, x n are n independent Gaussian random variables with mean /ij and variance of for the 
variable Xi, the sum 

x = {Xl ~ 2 /it)2 (234) 

is distributed as a x 2 with n degrees of freedom. The x 2 ( n ) distribution has the probability density 
function 

HT~2 ^~ P "2" 

= — — (235) 

v ; 2fr(f) v ; 

where x > is required. The x 2 (^-distribution has mean n and variance 2n. To generate a 
random variable x which is distributed as x 2 ( n ) one can start from the definition and generate n 
Gaussian random variables. However there is a more efficient approach. For n even one generates 
n/2 uniform numbers Uj and sets 

x = — 21n (U1U2 • ... • u n /2) • (236) 
For n odd, one generates (n — l)/2 uniform numbers Ui and one Gaussian y and sets 

x = -2 In (uiu 2 ■ ... ■ U( n _i)/ 2 ) + y 2 . (237) 



B.3 Binomial distribution 

Binomial distributions are obtained from random processes with exactly two possible outcomes. 
If the propability of a hit in each trial is p, then the probability of obtaining exactly r hits in n 
trials is given by 

p ^ = i? ' ,/ '(l-P)"- r > (238) 
r!(n — rj! 

where r = 0,1,2,3,... is an integer and < p < 1. A random variable r distributed according 
to eq. |238| is called binomialy distributed. The binomial distribution has mean np and variance 
np(l —p). One possibility to generate integer numbers r — 0, n, which are distributed according 
to a binomial distribution, is directly obtained from the definition: 

1. Set r = and m = 0. 

2. Generate a uniform random number u. If u < p increase k = k + 1. 

3. Increase m = m + 1. If m < n go back to step 2, otherwise return 

The computer time for this algorithm grows linearly with n. The algorithm can be used for small 
values of n. For large n there are better algorithms |37l [38[. 



B.4 Poisson distirubtion 

The Poisson distribution is the limit n — >oo,p— >0, np = /i of the binomial distribution. The 
probability density function for the Poisson distribution reads 

p(r) = ^— j— , (239) 
r! 

where /1 > and r = 0, 1, 2, 3, ... is an integer. The Poisson distribution has mean u and variance 
[j,. For large fi the Poisson distribution approaches the Gaussian distribution. To generate integer 
numbers r = 0, 1, 2, ... distributed according to a Poisson distribution, one proceeds as follows: 

1. Initialize r = 0, A = 1. 

2. Generate a uniform random number u, set A = uA. If ^4 < accept r and stop. 

3. Increase r — r + 1 and goto step 2. 
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B.5 Gamma distribution 

The probability density function of the gamma distribution is given by 

fc — 1 \kp—\x 

»<*> = — m~ (240) 

with x > 0, A > and fc > is not required to be an integer. The gamma distribution has mean 
k/X and variance fc/A 2 . 

The special case fc = 1 gives the exponential distribution 

p(x) = Xe- Xx . (241) 

To generate random numbers distributed according to a gamma distribution, we consider the cases 
fc = 1, fc > 1 and < fc < 1 separately. For simplicity we use A = 1. Results for A ^ 1 arc easily 
obtained by dividing the resulting random number a; by A. 

The case fc = 1 is just the exponential distribution. One generates a uniform random number 
u and sets x = — ln(w). 

For the case < fc < 1 one uses a rejection technique: One first observes that the function 

9{x) = \W "t"- 1 ' ( 242 ) 
is a majorizing function for p(x). The function 

with e = 2.71828... being the base of the natural logarithm is a probability density proportional 
to g{x), which can easily be sampled. This yields the following algorithm: 

1. Set vi = 1 + k/e. 

2. Generate two uniform random number m, u 2 and set v 2 = V\U\. 

3. If v 2 < 1, set x = v^ k and accept x if u 2 < e~ x , otherwise go back to step 2. 

4. If v 2 > 1, set x = — ln((«i — v 2 )/k) and accept x if u 2 < otherwise go back to step 2. 
For the case fc > 1 one uses the majorization 

x k-i e -x 1 , k _ 1 , k -i e -(k-i) 

— s W) i + m ■ k>h (244) 

and obtains the following algorithm: 

1. Set b = fc - 1, A = fc + b and s = \[A. 

2. Generate a uniform random number u\ and set t = stan (7r(ui — 1/2)) and x = b + t. 

3. If a; < go back to step 2. 

4. Generate a uniform random number u 2 and accept x if 

w 2 < exp^ln(|) -t + ln^l + ^.^ , (245) 
otherwise go back to step 2. 
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B.6 Beta distributions 



The probability density function of the beta distribution is given by 

= m S > «<*<!. «./9> 0, (246) 

where B(a,(3) = T(a)T(p)/T(a + /?). Samples according to a beta distribution can be obtained 
from a gamma distribution. If x and y are two independent deviates which are (standard, e.g. 
A = 1) gamma-distributed with parameters a and /3 respectively, then x/(x + y) is B{a,[3) 
distributed. 



B.7 Student's t distribution 

If x and independent Gaussian random variables with mean and variance 1, the 

quantity 



1 zjn 



with z = ^x 2 l , (247) 



is distributed according to a Student's t distribution with n degrees of freedom. The probability 
density function of Student's t distribution is given by 



The range of t is — oo < t < oo, and n is not required to be an integer. For n > 3 Student's 
t distribution has mean and variance n/(n — 2). To generate a random variable t distributed 
according to Student's t distribution for n > one generates a random variable x distributed as 
Gaussian with mean and variance 1, as well as a random variable y, distributed as according to 
a gamma distribution with k — n/2 and A = 1. Then 

t = x^ (249) 
is distributed as a t with n degrees of freedom. 

In the case n = 1 Student's t distribution reduces to the Breit-Wigner distribution: 

v(t) . i^. (250) 

In this case it it simpler to generate the distribution as follows: Generate to uniform random 
numbers Ui, 112, set v\ — 2u\ — 1 and v 2 = 2u 2 — 1. If v\ + v\ < 1 set t = vi/v 2 , otherwise start 
again. 
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